智能论文笔记

IIITT@Dravidian-CodeMix-FIRE2021: Transliterate or translate? Sentiment analysis of code-mixed text in Dravidian languages

Karthik Puranik , Bharathi B , Senthil Kumar B

分类：自然语言处理

2021-11-15

社交媒体职位的情感分析和各种营销和情感目的的评论正在获得认可。随着各种母语中的代码混合含量的增加，需要康复研究来产生有前途的结果。这篇研究论文以广泛的Dravidian语言kannada，泰米尔和马拉雅拉姆的语义混合社交媒体评论的情感分析，对这项研究赋予了这项研究。它描述了Dravidian-Codemix在Fire 2021通过使用预先训练的模型如Ulmfit和Multi语言BERT在代码混合数据集，音译（TRAAI）上的训练型模型，英文翻译（TRAA）的培训模型来描述TRAI数据和所有三个的结合。结果记录在本研究论文中，最佳型号分别在泰米尔，克南纳和马拉雅拉姆任务中站在4号，第5和第10位。

translated by 谷歌翻译

IndicMT Eval: A Dataset to Meta-Evaluate Machine Translation metrics for Indian Languages

Ananya B. Sai , Vignesh Nagarajan , Tanay Dixit , Raj Dabre , Anoop Kunchukuttan , Pratyush Kumar , Mitesh M. Khapra

分类：自然语言处理

2022-12-20

The rapid growth of machine translation (MT) systems has necessitated comprehensive studies to meta-evaluate evaluation metrics being used, which enables a better selection of metrics that best reflect MT quality. Unfortunately, most of the research focuses on high-resource languages, mainly English, the observations for which may not always apply to other languages. Indian languages, having over a billion speakers, are linguistically different from English, and to date, there has not been a systematic study of evaluating MT systems from English into Indian languages. In this paper, we fill this gap by creating an MQM dataset consisting of 7000 fine-grained annotations, spanning 5 Indian languages and 7 MT systems, and use it to establish correlations between annotator scores and scores obtained using existing automatic metrics. Our results show that pre-trained metrics, such as COMET, have the highest correlations with annotator scores. Additionally, we find that the metrics do not adequately capture fluency-based errors in Indian languages, and there is a need to develop metrics focused on Indian languages. We hope that our dataset and analysis will help promote further research in this area.

translated by 谷歌翻译

Biomedical image analysis competitions: The state of current participation practice

Matthias Eisenmann , Annika Reinke , Vivienn Weru , Minu Dietlinde Tizabi , Fabian Isensee , Tim J. Adler , Patrick Godau , Veronika Cheplygina , Michal Kozubek , Sharib Ali

分类：计算机视觉 | 机器学习

2022-12-16

The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.

translated by 谷歌翻译

Chaotic Variational Auto Encoder based One Class Classifier for Insurance Fraud Detection

K. S. N. V. K. Gangadhar , B. Akhil Kumar , Yelleti Vivek , Vadlamani Ravi

分类：机器学习

2022-12-15

Of late, insurance fraud detection has assumed immense significance owing to the huge financial & reputational losses fraud entails and the phenomenal success of the fraud detection techniques. Insurance is majorly divided into two categories: (i) Life and (ii) Non-life. Non-life insurance in turn includes health insurance and auto insurance among other things. In either of the categories, the fraud detection techniques should be designed in such a way that they capture as many fraudulent transactions as possible. Owing to the rarity of fraudulent transactions, in this paper, we propose a chaotic variational autoencoder (C-VAE to perform one-class classification (OCC) on genuine transactions. Here, we employed the logistic chaotic map to generate random noise in the latent space. The effectiveness of C-VAE is demonstrated on the health insurance fraud and auto insurance datasets. We considered vanilla Variational Auto Encoder (VAE) as the baseline. It is observed that C-VAE outperformed VAE in both datasets. C-VAE achieved a classification rate of 77.9% and 87.25% in health and automobile insurance datasets respectively. Further, the t-test conducted at 1% level of significance and 18 degrees of freedom infers that C-VAE is statistically significant than the VAE.

translated by 谷歌翻译

Explainable AI over the Internet of Things (IoT): Overview, State-of-the-Art and Future Directions

Senthil Kumar Jagatheesaperumal , Quoc-Viet Pham , Rukhsana Ruby , Zhaohui Yang , Chunmei Xu , Zhaoyang Zhang

分类：人工智能 | 机器学习

2022-11-02

Explainable Artificial Intelligence (XAI) is transforming the field of Artificial Intelligence (AI) by enhancing the trust of end-users in machines. As the number of connected devices keeps on growing, the Internet of Things (IoT) market needs to be trustworthy for the end-users. However, existing literature still lacks a systematic and comprehensive survey work on the use of XAI for IoT. To bridge this lacking, in this paper, we address the XAI frameworks with a focus on their characteristics and support for IoT. We illustrate the widely-used XAI services for IoT applications, such as security enhancement, Internet of Medical Things (IoMT), Industrial IoT (IIoT), and Internet of City Things (IoCT). We also suggest the implementation choice of XAI models over IoT systems in these applications with appropriate examples and summarize the key inferences for future works. Moreover, we present the cutting-edge development in edge XAI structures and the support of sixth-generation (6G) communication services for IoT applications, along with key inferences. In a nutshell, this paper constitutes the first holistic compilation on the development of XAI-based frameworks tailored for the demands of future IoT use cases.

translated by 谷歌翻译

Learning to Drop Out: An Adversarial Approach to Training Sequence VAEs

Đorđe Miladinović , Kumar Shridhar , Kushal Jain , Max B. Paulus , Joachim M. Buhmann , Carl Allen

分类：机器学习

2022-09-26

原则上，将变异自动编码器（VAE）应用于顺序数据提供了一种用于控制序列生成，操纵和结构化表示学习的方法。但是，训练序列VAE具有挑战性：自回归解码器通常可以解释数据而无需使用潜在空间，即后置倒塌。为了减轻这种情况，最新的模型通过将均匀的随机辍学量应用于解码器输入来削弱强大的解码器。从理论上讲，我们表明，这可以消除解码器输入提供的点式互信息，该信息通过利用潜在空间来补偿。然后，我们提出了一种对抗性训练策略，以实现基于信息的随机辍学。与标准文本基准数据集上的均匀辍学相比，我们的目标方法同时提高了序列建模性能和潜在空间中捕获的信息。

translated by 谷歌翻译

Towards Smart Fake News Detection Through Explainable AI

Athira A B , S D Madhu Kumar , Anu Mary Chacko

分类：人工智能

2022-07-23

人们现在将社交媒体网站视为其唯一信息来源，因为它们的受欢迎程度。大多数人通过社交媒体获取新闻。同时，近年来，假新闻在社交媒体平台上成倍增长。几种基于人工智能的解决方案用于检测假新闻，已显示出令人鼓舞的结果。另一方面，这些检测系统缺乏解释功能，即解释为什么他们做出预测的能力。本文在可解释的假新闻检测中突出了当前的艺术状态。我们讨论了当前可解释的假新闻检测模型中的陷阱，并介绍了我们正在进行的有关多模式可解释的假新闻检测模型的研究。

translated by 谷歌翻译

Exploiting Unlabeled Data with Vision and Language Models for Object Detection

Shiyu Zhao , Zhixing Zhang , Samuel Schulter , Long Zhao , Vijay Kumar B. G , Anastasis Stathopoulos , Manmohan Chandraker , Dimitris Metaxas

分类：计算机视觉

2022-07-18

构建强大的通用对象检测框架需要扩展到更大的标签空间和更大的培训数据集。但是，大规模获取数千个类别的注释是高昂的成本。我们提出了一种新颖的方法，该方法利用了最近的视觉和语言模型中可用的丰富语义来将对象定位和分类在未标记的图像中，从而有效地生成了伪标签以进行对象检测。从通用和类别的区域建议机制开始，我们使用视觉和语言模型将图像的每个区域分类为下游任务所需的任何对象类别。我们在两个特定的任务（开放式摄影检测检测）中演示了生成的伪标签的值，其中模型需要概括为看不见的对象类别以及半监督对象检测，其中可以使用其他未标记的图像来改善模型。我们的经验评估显示了伪标签在这两个任务中的有效性，我们在其中优于竞争基准并实现了开放式摄制对象检测的新颖最新。我们的代码可在https://github.com/xiaofeng94/vl-plm上找到。

translated by 谷歌翻译

Lookback for Learning to Branch

Prateek Gupta , Elias B. Khalil , Didier Chetélat , Maxime Gasse , Yoshua Bengio , Andrea Lodi , M. Pawan Kumar

分类：机器学习 | (统计)机器学习

2022-06-30

表达性和计算便宜的两分图神经网络（GNN）已被证明是基于深度学习的混合成分线性程序（MILP）求解器的重要组成部分。最近的工作证明了此类GNN在分支结合（B＆B）求解器中取代分支（可变选择）启发式方面的有效性。这些GNN经过训练，离线和集合，以模仿一个非常好但计算昂贵的分支启发式，强大的分支。鉴于B＆B会导致子隔间树，我们问（a）目标启发式启发式在B＆B树的邻近节点之间是否存在很强的依赖性，并且（b）如果是这样，我们是否可以将它们合并到我们的培训程序。具体来说，我们发现，有了强大的分支启发式，孩子节点的最佳选择通常是父母的第二好的选择。我们将其称为“回顾”现象。令人惊讶的是，Gasse等人的典型分支GNN。（2019年）经常错过这个简单的“答案”。为了通过将回顾现象纳入GNN来更紧密地模仿目标行为，我们提出了两种方法：（a）标准跨凝性损失函数的目标平滑，（b）添加父级（PAT）target（PAT）回顾量学期。最后，我们提出了一个模型选择框架，以结合更难构建的目标，例如在最终模型中解决时间。通过对标准基准实例进行广泛的实验，我们表明我们的提案导致B＆B树大小的22％减少，并且在解决时间的解决方案中提高了15％。

translated by 谷歌翻译

Woodscape Fisheye Object Detection for Autonomous Driving -- CVPR 2022 OmniCV Workshop Challenge

Saravanabalagi Ramachandran , Ganesh Sistu , Varun Ravi Kumar , John McDonald , Senthil Yogamani

分类：计算机视觉

2022-06-26

对象检测是自动驾驶中的一个全面研究的问题。但是，在鱼眼相机的情况下，它的探索相对较少。强烈的径向失真破坏了卷积神经网络的翻译不变性电感偏置。因此，我们提出了自动驾驶的木观鱼眼检测挑战，这是CVPR 2022年全向计算机视觉（OMNICV）的一部分。这是针对鱼眼相机对象检测的首批比赛之一。我们鼓励参与者设计在没有纠正的情况下对鱼眼图像的本地工作的模型。我们使用Codalab根据公开可用的Fisheye数据集主持竞争。在本文中，我们提供了有关竞争的详细分析，该分析吸引了120个全球团队的参与和1492份提交的参与。我们简要讨论获胜方法的细节，并分析其定性和定量结果。

translated by 谷歌翻译